Cross-Language Information Retrieval for Technical Documents

نویسندگان

  • Atsushi Fujii
  • Tetsuya Ishikawa
چکیده

This paper proposes a Japanese/English crosslanguage information retrieval (CLIR) system targeting technical documents. Our system rst translates a given query containing technical terms into the target language, and then retrieves documents relevant to the translated query. The translation of technical terms is still problematic in that technical terms are often compound words, and thus new terms can be progressively created simply by combining existing base words. In addition, Japanese often represents loanwords based on its phonogram. Consequently, existing dictionaries nd it di cult to achieve su cient coverage. To counter the rst problem, we use a compound word translation method, which uses a bilingual dictionary for base words and collocational statistics to resolve translation ambiguity. For the second problem, we propose a transliteration method, which identi es phonetic equivalents in the target language. We also show the e ectiveness of our system using a test collection for CLIR.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Applying Machine Translation to Two-Stage Cross-Language Information Retrieval

Cross-language information retrieval (CLIR), where queries and documents are in di erent languages, needs a translation of queries and/or documents, so as to standardize both of them into a common representation. For this purpose, the use of machine translation is an e ective approach. However, computational cost is prohibitive in translating large-scale document collections. To resolve this pr...

متن کامل

Japanese/English Cross-Language Information Retrieval: Exploration of Query Translation and Transliteration

Cross-language information retrieval (CLIR), where queries and documents are in different languages, has of late become one of the major topics within the information retrieval community. This paper proposes a Japanese/English CLIR system, where we combine a query translation and retrieval modules. We currently target the retrieval of technical documents, and therefore the performance of our sy...

متن کامل

Extraction of Training Sets for Experimentation with Cross Language Information Retrieval Systems

In this paper we focus on methods, models and tools for the extraction of bilingual training / test sets useful for the (semi) automatic classification of textual documents. Such documents could be tutorials, technical specifications, articles, personal notes, etc. Another motivation for our research is the need for managing corpus of classified texts and especially parallel corpora (texts). We...

متن کامل

Applying Machine Translation to Two-Stage Cross-Language Information

Cross-language information retrieval (CLIR), where queries and documents are in different languages, needs a translation of queries and/or documents, so as to standardize both of them into a common representation. For this purpose, the use of machine translation is an effective approach. However, computational cost is prohibitive in translating large-scale document collections. To resolve this ...

متن کامل

An Approach to Cross-Age and Cross-Cultural Information Access for Digital Humanities

1. Introduction Since libraries have collection of documents across age and culture, and even language, the libraries are inherently multi-age, multi-cultural, and multilingual. In the digital age, more and more historical documents are being digitized to preserve contents written in deteriorating papers. Library, etc.). It means that more and more old text contents will be accessible on the in...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره cs.CL/9907007  شماره 

صفحات  -

تاریخ انتشار 1999